neural variational inference and learning
Neural Variational Inference and Learning in Undirected Graphical Models
Many problems in machine learning are naturally expressed in the language of undirected graphical models. Here, we propose black-box learning and inference algorithms for undirected models that optimize a variational approximation to the log-likelihood of the model. Central to our approach is an upper bound on the log-partition function parametrized by a function q that we express as a flexible neural network. Our bound makes it possible to track the partition function during learning, to speed-up sampling, and to train a broad class of hybrid directed/undirected models via a unified variational inference framework. We empirically demonstrate the effectiveness of our method on several popular generative modeling datasets.
Reviews: Neural Variational Inference and Learning in Undirected Graphical Models
In this paper the authors essentially propose to train a MLP to generate proposal samples which are used to estimate the partition function Z of an undirected model. Instead of using straight importance sampling to estimate Z (which would be an unbiased estimator for Z), they propose a bound that overestimates Z 2 *in expectation*. While the authors highlight around line 70 that this only works when q is sufficiently close to p, I think it should be made even clearer that almost any estimate with a finite number of samples will *underestimate* Z 2 when q is not sufficiently close. I agree with the authors that this is probably not an issue at the beginning of training -- but I imagine it becomes an issue as p becomes multimodal/peaky towards convergence, when q cannot follow that distribution anymore. Which begs the question: Why would we train an undirected model p, when the training and evaluation method breaks down around the point when the jointly trained and properly normalized proposal distribution q cannot follow it anymore?
Neural Variational Inference and Learning in Undirected Graphical Models
Kuleshov, Volodymyr, Ermon, Stefano
Many problems in machine learning are naturally expressed in the language of undirected graphical models. Here, we propose black-box learning and inference algorithms for undirected models that optimize a variational approximation to the log-likelihood of the model. Central to our approach is an upper bound on the log-partition function parametrized by a function q that we express as a flexible neural network. Our bound makes it possible to track the partition function during learning, to speed-up sampling, and to train a broad class of hybrid directed/undirected models via a unified variational inference framework. We empirically demonstrate the effectiveness of our method on several popular generative modeling datasets.
Neural Variational Inference and Learning in Belief Networks
Highly expressive directed latent variable models, such as sigmoid belief networks, are difficult to train on large datasets because exact inference in them is intractable and none of the approximate inference methods that have been applied to them scale well. We propose a fast non-iterative approximate inference method that uses a feedforward network to implement efficient exact sampling from the variational posterior. The model and this inference network are trained jointly by maximizing a variational lower bound on the log-likelihood. Although the naive estimator of the inference network gradient is too high-variance to be useful, we make it practical by applying several straightforward modelindependent variance reduction techniques. Applying our approach to training sigmoid belief networks and deep autoregressive networks, we show that it outperforms the wake-sleep algorithm on MNIST and achieves state-of-the-art results on the Reuters RCV1 document dataset.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)